72 research outputs found

    Towards a Distant Reading of the Golden Ages Hendecasyllable: Metrical Patterns, Frequencies and Historical Development

    Get PDF
    En este trabajo se desarrolla un análisis de los principales tipos de endecasílabos utilizados en los sonetos del Siglo de Oro. Como novedad, aplicamos un método de análisis macro o distante, mediante el análisis computacional de un corpus de más de setenta mil (70.000) versos. A partir de un modelo formal de patrón métrico, analizamos los tipos de patrones métricos más frecuentes y su evolución histórica. Los resultados, sin ser aún concluyentes, sí muestran las principales preferencias métricas de los diferentes autores y cómo varían a lo largo de los siglos XVI y XVII.In this paper an analysis of the hendecasyllable meter in the Golden Age Spanish sonnets is presented. A macroanalysis or (computer-based) “distant reading” approach is applied to a corpus of more than 70 000 hendecasyllables. Based on a formal definition of metrical pattern, I analyze the most frequent metrical patterns and their historical development. Results are not entirely conclusive, but they show the main authors’ metrical preferences and their evolution during 16th and 17th Centuries

    Ordenación de eventos multidocumento usando inferencia de relaciones temporales y modelos semánticos distribucionales

    Get PDF
    This paper focuses on the contribution of temporal relations inference and distributional semantic models to the event ordering task. Our system automatically builds ordered timelines of events from different written texts in English by performing first temporal clustering and then semantic clustering. In order to determine temporal compatibility, an inference from the temporal relationships between events –automatically extracted from a Temporal Information Processing system– is applied. Regarding semantic compatibility between events, we analyze two different distributional semantic models: LDA Topic modeling and Word2Vec word embeddings. Both semantic models together with the temporal inference have been evaluated within the framework of SemEval 2015 Task 4 Track B. Experiments show that, using both models, the current State of the Art is improved, showing significant advance in the Cross-Document Event Ordering task.Este artículo se centra en estudiar la contribución que la inferencia de relaciones temporales y los modelos semánticos distribucionales hacen a la tarea de ordenación de eventos. Nuestro sistema construye automáticamente líneas de tiempo con eventos extraídos de diferentes documentos escritos en inglés. Para ello realiza primero una agrupación temporal y posteriormente una agrupación semántica. Para determinar la compatibilidad temporal se realiza una inferencia sobre las relaciones temporales entre los eventos extraídos de un sistema automático de procesamiento de información temporal. Para la compatibilidad semántica entre eventos hemos analizado dos modelos semánticos distribucionales distintos: LDA Topic Modeling y Word2Vec Word Embeddings. Ambos modelos semánticos junto con la inferencia temporal han sido evaluados bajo el marco de evaluación de SemEval 2015 Task 4 Track B. Los experimentos muestran que, usando ambos modelos se mejora el estado del arte actual, implicando un avance importante en la tarea de ordenación de eventos multidocumento.This paper has been partially supported by the Spanish government, project TIN2015-65100-R, project TIN2015-65136-C2-2-R and PROMETEOII/2014/001

    Hacia un análisis distante del endecasílabo áureo: patrones métricos, frecuencias y evolución histórica

    Get PDF
    En este trabajo se desarrolla un análisis de los principales tipos de endecasílabos utilizados en los sonetos del Siglo de Oro. Como novedad, aplicamos un método de análisis macro o distante, mediante el análisis computacional de un corpus de más de setenta mil (70.000) versos. A partir de un modelo formal de patrón métrico, analizamos los tipos de patrones métricos más frecuentes y su evolución histórica. Los resultados, sin ser aún concluyentes, sí muestran las principales preferencias métricas de los diferentes autores y cómo varían a lo largo de los siglos XVI y XVII.In this paper an analysis of the hendecasyllable meter in the Golden Age Spanish sonnets is presented. A macroanalysis or (computer-based) “distant reading” approach is applied to a corpus of more than 70 000 hendecasyllables. Based on a formal definition of metrical pattern, I analyze the most frequent metrical patterns and their historical development. Results are not entirely conclusive, but they show the main authors’ metrical preferences and their evolution during 16th and 17th Centuries

    On Poetic Topic Modeling: Extracting Themes and Motifs From a Corpus of Spanish Poetry

    Get PDF
    This paper analyzes the application of LDA topic modeling to a corpus of poetry. First, it explains how the most coherent LDA-topics have been established by running several tests and automatically evaluating the coherence of the resulting LDA-topics. Results show, on one hand, that when dealing with a corpus of poetry, lemmatization is not advisable because several poetic features are lost in the process; and, on the other hand, that a standard LDA algorithm is better than a specific version of LDA for short texts (LF-LDA). The resulting LDA-topics have then been manually analyzed in order to define the relation between word topics and poems. The analysis shows that there are mainly two kinds of semantic relations: an LDA-topic could represent the subject or theme of the poem, but it could also represent a poetic motif. All these analyses have been undertaken on a large corpus of Golden Age Spanish sonnets. Finally, the paper shows the most relevant themes and motifs in this corpus such as “love,” “religion,” “heroics,” “moral,” or “mockery” on one hand, and “rhyme,” “marine,” “music,” or “painting” on the other hand.This work was supported by the BBVA Foundation: grants for research groups 2016, project Distant Reading Approach to Golden Age Spanish Sonnets (Ayudas fundación BBVA a equipos de investigación científica, proyecto Análisis distante de base computacional del soneto castellano del Siglo de Oro): http://adso.gplsi.es. It was also partially conducted in the context of the COST Action Distant Reading for European Literary History (CA16204 - Distant-Reading): www.distant-reading.net

    An approach to the recommendation of scientific articles according to their degree of specificity

    Get PDF
    En este artículo se presenta un método para recomendar artículos científicos teniendo en cuenta su grado de generalidad o especificidad. Este enfoque se basa en la idea de que personas menos expertas en un tema preferirían leer artículos más generales para introducirse en el mismo, mientras que personas más expertas preferirían artículos más específicos. Frente a otras técnicas de recomendación que se centran en el análisis de perfiles de usuario, nuestra propuesta se basa puramente en el análisis del contenido. Presentamos dos aproximaciones para recomendar artículos basados en el modelado de tópicos (Topic Modelling). El primero de ellos se basa en la divergencia de tópicos que se dan en los documentos, mientras que el segundo se basa en la similitud que se dan entre estos tópicos. Con ambas medidas se consiguió determinar lo general o específico de un artículo para su recomendación, superando en ambos casos a un sistema de recuperación de información tradicional.This article presents a method for recommending scientific articles taking into consideration their degree of generality or specificity. This approach is based on the idea that less expert people in a specific topic prefer to read more general articles to be introduced into it, while people with more expertise prefer to read more specific articles. Compared to other recommendation techniques that focus on the analysis of user profiles, our proposal is purely based on content analysis. We present two methods for recommending articles, based on Topic Modelling. The first one is based on the divergence of topics given in the documents, while the second uses the similarities that exist between these topics. By using the proposed methods it was possible to determine the degree of specificity of an article, and the results obtained with them overcame those produced by an information retrieval traditional system.Este trabajo ha sido parcialmente financiado por los siguientes proyectos: ATTOS (TIN2012-38536-C03-03), LEGOLANG-UAGE (TIN2012-31224), FIRST (FP7-287607), DIIM2.0 (PROMETEOII/2014/001) y por el Programa Nacional de Movilidad de Recursos Humanos del Plan Nacional de I+D+i (CAS12/00113)

    Metrical Annotation of a Large Corpus of Spanish Sonnets: Representation, Scansion and Evaluation

    Get PDF
    In order to analyze metrical and semantics aspects of poetry in Spanish with computational techniques, we have developed a large corpus annotated with metrical information. In this paper we will present and discuss the development of this corpus: the formal representation of metrical patterns, the semi-automatic annotation process based on a new automatic scansion system, the main annotation problems, and the evaluation, in which an inter-annotator agreement of 96% has been obtained. The corpus is open and available

    Cross-document event ordering through temporal, lexical and distributional knowledge

    Get PDF
    In this paper we present a system that automatically builds ordered timelines of events from different written texts in English. The system deals with problems such as automatic event extraction, cross-document temporal relation extraction and cross-document event coreference resolution. Its main characteristic is the application of three different types of knowledge: temporal knowledge, lexical-semantic knowledge and distributional-semantic knowledge, in order to anchor and order the events in the timeline. It has been evaluated within the framework of SemEval 2015. The proposed system improves the current state-of-the-art systems in all measures (up to eight points of F1-score over other systems) and shows a significant advance in the Cross-document event ordering task.This paper has been partially supported by the Spanish government, project TIN2015-65100-R and project TIN2015-65136-C2-2-R

    Contextual word embeddings for tabular data search and integration

    Get PDF
    This paper presents a new approach to retrieve and further integrate tabular datasets (collections of rows and columns) using union and join operations. In this work, both processes were carried out using a similarity measure based on contextual word embeddings, which allows finding semantically similar tables and overcome the recall problem of lexical approaches based on string similarity. This work is the first attempt to use contextual word embeddings in the whole pipeline of table search and integration, including for the first time their use in the join operation. A comprehensive analysis of their performance was carried out on both retrieving and integrating tabular datasets, comparing them with context-free models. Column headings and cell values were used as contextual information and their impact on each task was evaluated. The results revealed that contextual models significantly outperform context-free models and a traditional weighting schema in ad hoc table retrieval. In the data integration task, contextual models also improved the results on union operation compared to context-free approaches.Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature. This research has been partially funded by project “Desarrollo de un ecosistema de datos abiertos para transformar el sector turístico” (GVA-COVID19/2021/103) funded by Conselleria de Innovación, Universidades, Ciencia y Sociedad Digital de la Generalitat Valenciana (Spain); and by projects “CHAN-TWIN” (TED2021-130890B-C21), “COnscious natuRal TEXt generation (CORTEX)” (PID2021-123956OB-I00) and “Technological Resources for Intelligent VIral AnaLysis through NLP (TRIVIAL)” (PID2021-122263OB-C22), funded by MCIN/AEI/ 10.13039/501100011033 and by the European Union NextGenerationEU/PRTR

    NATSUM: Narrative abstractive summarization through cross-document timeline generation

    Get PDF
    A new approach to narrative abstractive summarization (NATSUM) is presented in this paper. NATSUM is centered on generating a narrative chronologically ordered summary about a target entity from several news documents related to the same topic. To achieve this, first, our system creates a cross-document timeline where a time point contains all the event mentions that refer to the same event. This timeline is enriched with all the arguments of the events that are extracted from different documents. Secondly, using natural language generation techniques, one sentence for each event is produced using the arguments involved in the event. Specifically, a hybrid surface realization approach is used, based on over-generation and ranking techniques. The evaluation demonstrates that NATSUM performed better than extractive summarization approaches and competitive abstractive baselines, improving the F1-measure at least by 50%, when a real scenario is simulated.This research work has been partially funded by the Ministerio de Economía y Competitividad. España through projects TIN2015-65100-R, TIN2015-65136-C2-2-R, as well as by the project “Analisis de Sentimientos Aplicado a la Prevencion del Suicidio en las Redes Sociales (ASAP)” funded by Ayudas Fundación BBVA a equipos de investigacion cientifica. Moreover, it has been also funded by Generalitat Valenciana through project “SIIA: Tecnologías del lenguaje humano para una sociedad inclusiva, igualitaria, y accesible” with grant reference PROMETEU/2018/089